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Predicate abstraction provides a powerful tool for verifying properties of infinite-state systems 
using a combination of a decision procedure for a subset of first-order logic and symbolic methods 
originally developed for finite-state model checking. We consider models containing first-order 
state variables, where the system state includes mutable functions and predicates. Such a model 
can describe systems containing arbitrarily large memories, buffers, and arrays of identical pro- 
cesses. We describe a form of predicate abstraction that constructs a formula over a set of 
universally quantified variables to describe invariant properties of the first-order state variables. 
We provide a formal justification of the soundness of our approach and describe how it has been 
used to verify several hardware and software designs, including a directory-based cache coherence 
protocol. 

Categories and Subject Descriptors: F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and 
Reasoning about Programs — Invariants 

General Terms: Verification, Predicate Abstraction 

Additional Key Words and Phrases: formal verification, invariant synthesis, infinite-state verifi- 
cation, abstract interpretation, cache-coherence protocols 



1. INTRODUCTION 

Graf and Saidi introduced predicate abstraction [Graf and Saidi 1997] as a means of au- 
tomatically determining invariant properties of infinite-state systems. With this approach, 
the user provides a set of k Boolean formulas describing possible properties of the system 
state. These predicates are used to generate a finite state abstraction (containing at most 
2 k states) of the system. By performing a reachability analysis of this finite-state model, a 
predicate abstraction tool can generate the strongest possible invariant for the system ex- 
pressible in terms of this set of predicates. Prior implementations of predicate abstraction 
[Graf and Saidi 1997; Saidi and Shankar 1999; Das et al. 1999; Das and Dill 2001; Ball 
et al. 2001; Flanagan and Qadeer 2002; Chaki et al. 2003] required making a large num- 
ber of calls to a theorem prover or first-order decision procedure, and hence could only be 
applied to cases where the number of predicates was small. More recently, we have shown 
that both BDD and SAT-based Boolean methods can be applied to perform the analysis 
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efficiently [Lahiri et al. 2003]. 

In most formulations of predicate abstraction, the predicates contain no free variables, and 
hence they evaluate to true or false for each system state. The abstraction function a has 
a simple form, mapping each concrete system state to a single abstract state based on 
the effect of evaluating the k predicates. The task of predicate abstraction is to construct 
a formula i[>* consisting of some Boolean combination of the predicates such that ip*(s) 
holds for every reachable system state s. 

To verify systems containing unbounded resources, such as buffers and memories of arbi- 
trary size and systems with arbitrary numbers of identical, concurrent processes, the system 
model must support first-order state variables, in which the state variables are themselves 
functions or predicates [Ip and Dill 1996; Bryant et al. 2002b]. For example, a memory 
can be represented as a function mapping an address to the data stored at an address, while 
a buffer can be represented as a function mapping an integer index to the value stored at the 
specified buffer position. The state elements of a set of identical processes can be modeled 
as functions mapping an integer process identifier to the state element for the specified 
process. In many systems, this capability is restricted to arrays that can be altered only by 
writing to a single location [Burch and Dill 1994; McMillan 1998]. Our verifier allows a 
more general form of mutable function, where the updating operation is expressed using 
lambda notation. 

In verifying systems with first-order state variables, we require quantified predicates to de- 
scribe global properties of state variables, such as "At most one process is in its critical 
section," as expressed by the formula : crit(i) A crit(j) => i = j. Conventional 
predicate abstraction restricts the scope of a quantifier to within an individual predicate. 
System invariants often involve complex formulas with widely scoped quantifiers. The 
scoping restriction (the fact that the universal quantifier does not distribute over conjunc- 
tions) implies that these invariants cannot be divided into small, simple predicates. This 
puts a heavy burden on the user to supply predicates that encode intricate sets of properties 
about the system. Recent work attempts to discover quantified predicates automatically 
[Das and Dill 2002], but this is a formidable task. 

In this paper we present an extension of predicate abstraction in which the predicates in- 
clude free variables from a set of index variables X (and hence the name indexed predi- 
cates). The predicate abstraction engine constructs a formula ip* consisting of a Boolean 
combination of these predicates, such that the formula VXip* (s) holds for every reach- 
able system state s. With this method, the predicates can be very simple, with the pred- 
icate abstraction tool constructing complex, quantified invariant formulas. For example, 
the property that at most one process can be in its critical section could be derived by 
supplying predicates crit(i), crit(j), and i = j, where i and j are the index sym- 
bols. Encoding these predicates in the abstract system with Boolean variables ci, cj, and 
eij, respectively, we can verify this property by using predicate abstraction to prove that 
ci A cj eij holds for every reachable state of the abstract system. 

Flanagan and Qadeer use a method similar to ours [Flanagan and Qadeer 2002], and we 
briefly described our method in an earlier paper [Lahiri et al. 2003]. Our contribution in 
this paper is to describe the method more carefully, explore its properties, and to provide 
a formal argument for its soundness. The key idea of our approach is to formulate the 
abstraction function a to map a concrete system state s to the set of all possible valuations 
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of the predicates, considering the set of possible values for the index variables X. The 
resulting abstract system is unusual; it is not characterized by a state transition relation and 
hence cannot be viewed as a state transition system. Nonetheless, it provides an abstraction 
interpretation of the concrete system [Cousot and Cousot 1977] and hence can be used to 
find invariant system properties. 

Assuming a decision procedure that can determine the satisfiability of a formula with uni- 
versal quantifiers, we prove the following completeness result: Predicate abstraction can 
prove any property that can be proved by induction on the state sequence using an induction 
hypothesis expressed as a universally quantified formula over the given set of predicates. 
For many modeling logics, this decision problem is undecidable. By using quantifier in- 
stantiation, we can implement a sound, but incomplete verifier. As an extension, we show 
that it is easy to incorporate axioms into the system, properties that must hold universally 
for every system state. Axioms can be viewed simply as indexed predicates that must 
evaluate to true on every step. 

The ideas have been implemented in UCLID [Bryant et al. 2002b], a platform for model- 
ing and verifying infinite-state systems. Although we demonstrate the ideas in the context 
of this tool and the logic (CLU) it supports, the ideas developed here are not strongly tied 
to this logic. We conclude the paper by describing our use of predicate abstraction to verify 
several hardware and software systems, including a directory-based cache coherence proto- 
col devised by Steven German [German ]. We believe we are the first to verify the protocol 
for a system with an unbounded number of clients, each communicating via unbounded 
FIFO channels. 

1.1 Related Work 

Verifying systems with unbounded resources is in general undecidable. For instance, the 
problem of verifying if a system of N (N can be arbitrarily large) concurrent processes 
satisfies a property is undecidable [Apt and Kozen 1986]. Despite its complexity, the 
problem of verifying systems with arbitrary large resources (e.g. parameterized systems 
with N processes, out-of-order processors with arbitrary large reorder buffers, software 
programs with arbitrary large arrays) is of significant practical interest. Hence, in recent 
years, there has been a lot of interest in developing techniques based on model checking 
and deductive approaches for verifying such systems. 

McMillan uses "compositional model checking" [McMillan 1998] with various built-in 
abstractions to reduce an infinite-state system to a finite state system, which can be model 
checked using Boolean methods. The abstraction mechanisms include temporal case split- 
ting, datatype reduction [Clarke et al. 1992] and symmetry reduction. Temporal case split- 
ting uses heuristics to slice the program space to only consider the resources necessary 
for proving a property. Datatype reduction uses abstract interpretation [Cousot and Cousot 
1977] to abstract unbounded data and operations over them to operations over finite do- 
mains. For such finite domains, datatype reduction is subsumed by predicate abstraction. 
Symmetry is exploited to reduce the number of indices to consider for verifying unbounded 
arrays or network of processes. The method can prove both safety and liveness properties. 
Since the abstraction mechanisms are built into the system, they can often be coarse and 
may not suffice for proving a system. Besides, the user is often required to provide auxil- 
iary lemmas or to decompose the proof to be discharged by symbolic model checkers. For 
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instance, the proof of safety of the Bakery protocol [McMillan et al. 2000] or the proof 
of an out-of-order processor model [McMillan 1998] required non-trivial lemmas in the 
compositional model checking framework. 

Regular model checking [Kesten et al. 1997; Bouajjani et al. 2000] uses regular languages 
to represent parameterized systems and computes the closure for the regular relations to 
construct the reachable state space. In general, the method is not guaranteed to be complete 
and requires various acceleration techniques (sometimes guided by the user) to ensure ter- 
mination. Moreover, approaches based on regular language are not suited for representing 
data in the system. Several examples that we consider in this work can't be modeled in 
this framework; the out-of-order processor which contains data operations or the Peter- 
son's mutual exclusion are few such examples. Even though the Bakery algorithm can be 
verified in this framework, it requires considerable user ingenuity to encode the protocol 
in a regular language. 

Several researchers have investigated restrictions on the system description to make the 
parameterized verification problem decidable. Notable among them is the early work by 
German and Sistla [German and Sistla 1992] for verifying single-indexed properties for 
synchronously communicating systems. For restricted systems, finite "cut-off" based ap- 
proaches [Emerson and Namjoshi 1995; Emerson and Kahlon 2000; 2003] reduce the prob- 
lem to verifying networks of some fixed finite size. These bounds have been established 
for verifying restricted classes of ring networks and cache coherence protocols. Emer- 
son and Kahlon [Emerson and Kahlon 2003] have verified the version of German's cache 
coherence protocol with single entry channels by manually reducing it to a snoopy pro- 
tocol, for which finite cut-off exists. However, the reduction is manually performed and 
exploits details of operation of the protocol, and thus requires user ingenuity. It can't be 
easily extended to verify other unbounded systems including the Bakery algorithm or the 
out-of-order processors. 

The method of "invisible invariants" [Pnueli et al. 2001; Arons et al. 2001] uses heuristics 
for constructing universally quantified invariants for parameterized systems automatically. 
The method computes the set of reachable states for finite (and small) instances of the 
parameters and then generalizes them to parameterized systems to construct a potential 
inductive invariant. They provide an algorithm for checking the verification conditions for 
a restricted class of system called the stratified systems, which include German's protocol 
with single entry channels and Lamport's Bakery protocol [Lamport 1974]. However, the 
method simply becomes a heuristic for generating candidate invariants for non-stratified 
systems, which includes Peterson's mutual exclusion algorithm [Peterson 1981] and the 
Ad-hoc On-demand Distance Vector (AODV) [C.Perkins et al. 2002] network protocol. 
The class of bounded-data systems (where each variable is finite but parameterized) con- 
sidered by this approach can't model the the out-of-order processor model [Lahiri et al. 
2002] that we can verify using our method. 

Predicate abstraction with locally quantified predicates [Das and Dill 2002; Baukus et al. 
2002] require complex quantified predicates to construct the inductive assertions, as men- 
tioned in the introduction. These predicates are often as complex as invariants themselves. 
In fact, some of the invariants are used are predicates in [Baukus et al. 2002] to derive in- 
ductive invariants. The method in [Baukus et al. 2002] verified (both safety and liveness) a 
version of the cache coherence protocol with single entry channels, with complex manually 
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provided predicates. Baukus et al. [Baukus et al. 2002] uses the the logic of WSIS (weak 
second order logic with one successor) [Biichi 1960; Thomas 1990], which does not allow 
function symbols and thus can't model the out-of-order processor model. The automatic 
predicate discovery methods for quantified predicates [Das and Dill 2002] have not been 
demonstrated on most examples (except the AODV model) we consider in this paper. 

Flanagan and Qadeer [Flanagan and Qadeer 2002] use indexed predicates to synthesize 
loop invariants for sequential software programs that involve unbounded arrays. They also 
provide heuristics to extract some of the predicates from the program text automatically. 
The heuristics are specific to loops in sequential software and not suited for verifying more 
general unbounded systems that we handle in this paper. In this work, we explore formal 
properties of this formulation and apply it for verifying distributed systems. In a recent 
work [Lahiri and Bryant 2004], we provide a weakest precondition transformer [Dijkstra 
1975] based syntactic heuristic for discovering most of the predicates for many of the 
systems that we consider in this paper. 

2. NOTATION 

Rather than using the common indexed vector notation to represent collections of values 
(e.g., v = (v\, V2, ■ ■ ■ , v n )), we use a named set notation. That is, for a set of symbols A, 
we let v indicate a set consisting of a value v x for each x G A. 

For a set of symbols A, let 04 denote an interpretation of these symbols, assigning to 
each symbol x G A a value (74 ( x ) of the appropriate type (Boolean, integer, function, or 
predicate). Let X.4 denote the set of all interpretations (74 over the symbol set A. 

For interpretations (74 and erg over disjoint symbol sets A and B, let cr_4 1 denote an 
interpretation assigning either 04 (x) or (7g(x) to each symbol x G AD B, according to 
whether x G A or x G B. 

Figure 1 displays the syntax of the Logic of Counter arithmetic with Lambda expressions 
and Uninterpreted functions (CLU), a fragment of first-order logic extended with equality, 
inequality, and counters. An expression in CLU can evaluate to truth values (bool-expr), 
integers (int-expr), functions (function- expr) or predicates (predicate-expr). Notice that 
we only allow restricted arithmetic on terms, namely that of addition or subtraction by 
constants. Notice that we restrict the parameters to a lambda expression to be integers, and 
not function or predicate expressions. There is no way in our logic to express any form of 
iteration or recursion. 

For symbol set A, let E(A) denote the set of all CLU expressions over A. For any ex- 
pression G E(A) and interpretation 04 G £.4, let the valuation of <fi with respect to 1T4, 
denoted (<p) a be the (Boolean, integer, function, or predicate) value obtained by evaluat- 
ing (p when each symbol x G A is replaced by its interpretation (7.4 (x). 

Let v be a named set over symbols A, consisting of expressions over symbol set B. That 
is, v x G E(B) for each x G A. Given an interpretation a B of the symbols in B, evaluating 
the expressions in v defines an interpretation of the symbols in A, which we denote (v) . 
That is, (v) is an interpretation 04 such that cr.4(x) = (v x ) ae for each x G A. 

A substitution tt for a set of symbols A is a named set of expressions over some set of 
symbols B (with no restriction on the relation between A and B.) That is, for each x E A, 
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bool-expr ::= true | false | bool-symbol 

-^bool-expr \ {bool-expr A bool-expr) 
(int-expr = int-expr) \ {int-expr < int-expr) 
| predicate-expr(int-expr , . . . , int-expr) 
int-expr ::= lambda-var \ int-symbol 

| ITE(bool-expr , int-expr, int-expr) 
| int-expr + int-constant 
j function-expr(int-expr , . . . , int-expr) 
predicate-expr ::= predicate-symbol \ A lambda-var, . . . , lambda-var . bool-expr 
function-expr ::= function-symbol | A lambda-var , . . . , lambda-var . int-expr 



Fig. 1. CLU Expression Syntax. Expressions can denote computations of Boolean values, integers, or functions 
yielding Boolean values or integers. 



there is an expression ir x G E(B). For an expression ip G E(AUC), we let -0 [tt/A] denote 
the expression tp' G E(B U C) resulting when we replace each occurrence of each symbol 
x G A with the expression 7r x . These replacements are all performed simultaneously. 

PROPOSITION 2.1. Let ?p be an expression in E{A U C) and 7r be a substitution having 
7r x G E{B) for each x G A. For interpretations as andac, if we let 04 be the interpretation 
definedas a A = (tt) ctb , then (ip) aA . ac = (0 [^IA)a B -a c - 

This proposition captures a fundamental relation between syntactic substitution and expres- 
sion evaluation, sometimes referred to as referential transparency. We can interchangeably 
use a subexpression 7r x or the result of evaluating this subexpression 04 (x) in evaluating a 
formula containing this subexpression. 



3. SYSTEM MODEL 

We model the system as having a number of state elements, where each state element 
may be a Boolean or integer value, or a function or predicate. We use symbolic names to 
represent the different state elements giving the set of state symbols V. We introduce a set 
of initial state symbols J and a set of input symbols I representing, respectively, initial 
values and inputs that can be set to arbitrary values on each step of operation. Among 
the state variables, there can be immutable values expressing the behavior of functional 
units, such as ALUs, and system parameters such as the total number of processes or the 
maximum size of a buffer. Since these values are expressed symbolically, one run of the 
verifier can prove the correctness of the system for arbitrary functionalities, process counts, 
and buffer capacities. 

The overall system operation is characterized by an initial-state expression set q° and a 
next-state expression set S. The initial state consists of an expression for each state element, 
with the initial value of state element x given by expression G E{J). The transition 
behavior also consists of an expression for each state element, with the behavior for state 
element x given by expression S x G E(VL)I). In this expression, the state element symbols 
represent the current system state and the input symbols represent the current values of the 
inputs. The expression gives the new value for that state element. 
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We will use a very simple system as a running example throughout this presentation. The 
only state element is a function F, i.e. V = {F}. An input symbol i determines which 
element of F is updated. Initially, F is the identify function: 

(Jp = A u . u. 

On each step, the value of the function for argument i is updated to be F(i + 1). That is, 

S F = Xu.ITE(u = i, F(i + 1), F(u)) 

where the if-then-else operation ITE selects its second argument when the first one evalu- 
ates to true and the third otherwise. For the above example, J = {} and X = {i}. 

3.1 Concrete System 

A concrete system state assigns an interpretation to every state symbol. The set of states of 
the concrete system is given by £y, the set of interpretations of the state element symbols. 
For convenience, we denote concrete states using letters s and t rather than the more formal 
oy. 

From our system model, we can characterize the behavior of the concrete system in terms 
of an initial state set Q c C £y and a next-state function operating on sets Nc '■ ^(£y) — ► 
^(Sy). The initial state set is defined as: 

Q°c = {(q°) v Wj g £7}, 

i.e., the set of all possible valuations of the initial state expressions. The next-state function 
Nc is defined for a single state s as: 

Nc(s) = {(S) s . ax \ot e Ex}, 

i.e., the set of all valuations of the next-state expressions for concrete state s and arbitrary 
input. The function is then extended to sets of states by defining 

Nc(Sc) = (J N c (s). 
ses c 

We can also characterize the next-state behavior of the concrete system by a transition 
relation T where (s, t) e T when t £ N c (s). 

We define the set of reachable states Rq as containing those states s such that there is some 
state sequence so, Si, ■ • • , s n with s € Q c , s n = s, and s i+ i G Nc{si) for all values of 
i such that < i < n. We define the depth of a reachable state s to be the length n of the 
shortest sequence leading to s. Since our concrete system has an infinite number of states, 
there is no finite bound on the maximum depth over all reachable states. 

With our example system, the concrete state set consists of integer functions / such that 
f(u + l) > f(u) > u for all u and f(u) = u for infinitely many arguments of /. 

4. PREDICATE ABSTRACTION WITH INDEXED PREDICATES 

We use indexed predicates to express constraints on the system state. To define the abstract 
state space, we introduce a set of predicate symbols V and a set of index symbols X. The 
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Abstract System 


Concrete System 


Formula 


State Set 


System Property 


State Set 


V> 


Sa = <V) 




Sc = ~((S A ) 


p Aq 


{TT} 


Vx : f(x) > Ax > 





pA-.q 


{TF} 


Vx : f(x) > OAx < 





-■q 


{FF, TF} 


Vx : x < 





p 


{TF, TT} 


Vx : f (x) > 


U\f(x) > 0} 


pV-q 


{FF, TF, TT} 


Vx : x > f (x) > 


{f\x > f{x) > 0} 



Table I. Example abstract state sets and their concretizations Abstract state elements are represented by their 
interpretations of p and q. 



predicates consist of a named set (j>, where for each p G V, predicate p is a Boolean 
formula over the symbols in V U X. 

Our predicates define an abstract state space £p, consisting of all interpretations op of the 
predicate symbols. For k = \V\, the state space contains 2 k elements. 

As an illustration, suppose for our example system we wish to prove that state element F 
will always be a function / satisfying f(u) > for all u > 0. We introduce an index 
variable x and predicate symbols V — {p, q}, with <f> p = F(x) > and q = x > 0. 

We can denote a set of abstract states by a Boolean formula ip s E(V). This expression 
defines a set of states (tp) = {ap \ (V') (7p = true}. As an example, our two predicates 
4> p and q generate an abstract space consisting of four elements, which we denote FF, 
FT, TF, and TT, according to the interpretations assigned to p and q. There are then 16 
possible abstract state sets, some of which are shown in Table I. In this table, abstract state 
sets are represented both by Boolean formulas over p and q, and by enumerations of the 
state elements. 

We define the abstraction function a to map each concrete state to the set of abstract states 
given by the valuations of the predicates for all possible values of the index variables: 

«(«) = {(0UKe^} (i) 

= U {<*>-,*} ( 2 ) 

Note that (2) is simply a restatement of (1) using set union notation. 

Since there are multiple interpretations ax, a single concrete state will generally map to 
multiple abstract states. Figure 2 illustrates this fact. The abstraction function a. maps a 
single concrete state s to a set of abstract states — each abstract state ((4>} s . ax ) resulting 
from some interpretation ax- This feature is not found in most uses of predicate abstrac- 
tion, but it is the key idea for handling indexed predicates. 

Working with our example system, consider the concrete state given by the function Xu.u, 
in Figure 3. When we abstract this function relative to predicates 4> p and </> q , we get two 
abstract states: TT, when x > 0, and FF, when x < 0. This abstract state set is then 
characterized by the formula p <^ q. 
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Abstract Domain 

) Sa 
concretization 



abstraction 



Concrete Domain 

Fig. 2. Abstraction and Concretization. 




J(Sa) 



We then extend the abstraction function to apply to sets of concrete states in the usual way: 

(3) 



a(S c ) = |J a(s). 
ses c 

= u u <*>..« 

axeT, x seSc 

Note that (4) follows by combining (2) with (3), and then reordering the unions. 

{TT.FF} 



(4) 





Xu.u V.x : F(x) >0«i>0 

Fig. 3. Abstraction and Concretization for the initial state for the example. 

PROPOSITION 4. 1 . For any pair of concrete state sets Sc and Tq: 

(1) IfSc C T c , then a{S c ) C a(T c ). 

(2) a{S c ) U a(T c ) = a{S c U T c ). 

These properties follow directly from the way we extended a from a single concrete state 
to a set of concrete states. 
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We define the concretization function 7 to require universal quantification over the index 
symbols. That is, for a set of abstract states Sa C £7?, we let 7(<SU) be the following set 
of concrete states: 

7 (5 A ) = {s\Va x G Z x : (0) s . w G 5 A } (5) 

Consider the Figure 2, where a set of abstract states Sa has been concretized to a set of 
concrete states 7(5,4). It shows a concrete state t that is not included in ■j(Sa) because 
one of the states it abstracts to lies outside Sa- On the other hand, the concrete state u is 
contained in 7(5,4) because a(u) C Sa- One can provide an alternate definition of 7 as 
follows: 

j(S A ) = {s\a( S )CS A } (6) 

The universal quantifier in the definition of 7 has the consequence that the concretization 
function does not distribute over set union. In particular, we cannot view the concretization 
function as operating on individual abstract states, but rather as generating each concrete 
state from multiple abstract states. 

PROPOSITION 4.2. For any pair of abstract state sets S A and T A : 

CO IfS A C T A , then 7 (5 A ) C j(T A ). 
(2) j(Sa)U 1 (Ta)C 7 (S a UT a ). 

The first property follows from (5), while the second follows from the first. 

Consider our example system with predicates </> p and </> q . Table I shows some example 
abstract state sets S A and their concretizations "/(Sa)- As the first three examples show, 
some (altogether 6) nonempty abstract state sets have empty concretizations, because they 
constrain x to be either always negative or always nonnegative. On the other hand, there 
are 9 abstract state sets having nonempty concretizations. We can see by this that the 
concretization function is based on the entire abstract state set and not just on the individual 
values. For example, the sets {TF} and {TT} have empty concretizations, but {TF, TT} 
concretizes to the set of all nonnegative functions. 

THEOREM 4.3. The functions (0,7) form a Galois connection, i.e., for any sets of 
concrete states Sc and abstract states S A : 

a(S c ) C S A 5 C C 7 (5 A ) (7) 



PROOF. (This is one of several logically equivalent formulations of a Galois connection 
[Cousot and Cousot 1977].) The proof follows by observing that both the left and the 
right-hand sides of (7) hold precisely when for every ax G and every s G 5c we have 
{4>) s . ax G S A . Let us prove the two directions: 

(1) If: Let a(Sc) Q S A . By the definition of a in (1), this implies that for every s G 
Sc and for interpretation ax G Y*x, {<t>) s -<y x e By the definition of 7 in (5), 
7(5a) contains precisely those concrete states s' for which {4>) s i. ax G 5,4, for every 
interpretation ax G Y*x- Thus, for every s G 5c, s G 7(5,4 ) and consequently, 

5 C C 7(5,4). 
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(2) Only if: Let S c C j{S A ). Hence, by (5), for every s £ S c , (4>) s -o x e S A, for 
every interpretation ax G By the definition of a in (1), a(s) 6 5a- Further, by 
extending a for the entire set Sc by (3), we get a(Sc) C S 1 ^. 



Alternately, the functions (a, 7) form a Galois connection if they satisfy the following 
properties for any sets of concrete states Sc and abstract states Sa- 



These properties can be derived from (7). Similarly, (7) can be derived from (8) and (9). 
The containment relation in both (8) and (9) can be proper. For example, the concrete state 
set consisting of the single function Xu . u abstracts to the state set p <*=> q, which in turn 
concretizes to the set of all functions / such that f(u) > u > 0, for any argument u. 
This is clearly demonstrated in Fig 3. On the other hand, consider the set of abstract states 
represented by p A q. This set of abstract states has an empty concretization (see Table I), 
and thereby satisfies 0(7(5,4)) C Sa- 

5. ABSTRACT SYSTEM 

Predicate abstraction involves performing a reachability analysis over the abstract state 
space, where on each step we concretize the abstract state set via 7, apply the concrete next- 
state function, and then abstract the results via a. We can view this process as performing 
reachability analysis on an abstract system having initial state set Q A = a(Q c ) and a 
next-state function operating on sets: N a (Sa) = a(Nc("f(S A ))). Note that there is no 
transition relation associated with this next-state function, since 7 cannot be viewed as 
operating on individual abstract states. 

It can be seen that N A provides an abstract interpretation [Cousot and Cousot 1977] of the 
concrete system: 

(1) N A is null-preserving: N A {<D) = 

(2) N A is monotonic: S A C T A => N A (S A ) C N A (T A ). 

(3) N A simulates N c (with a simulation relation defined by a): a(N c (Scj) Q N A (a(Sc))- 

THEOREM 5.1. N A provides an abstract interpretation of the concrete transition sys- 
tem Nc- 

PROOF. Let us prove the three properties mentioned above: 

(1) This follows from the definition of N A and the fact that 7(0) = 0, 7V C (0) = and 
a(0) = 0. 

(2) By the definition of N A , and using the fact that 7, a and Nc are monotonic. Nc 
is monotonic since it distributes over the elements of a set of concrete states, i.e. 



□ 



Sc C j(a(S c ))- 
a(j(S A )) C S A . 



(8) 
(9) 



N c (Sc) = U 



s£S c 



N c (s). 
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(3) From (8), we know that Sc C -f(a(Sc))- By the monotonicity of Nc, Nc(Sc) C 
N c ("f(a(Sc)))- Since a is monotonic, we have a(N c (Sc)) Q a(N c ("f(a(Sc))))- 
Now applying the definition of Na, we get the desired result. 

□ 

6. REACHABILITY ANALYSIS 

Predicate abstraction involves performing a reachability analysis over the abstract state 
space, where on each step we concretize the abstract states via 7, apply the concrete tran- 
sition relation, and then abstract the results via a. In particular, define R l A , the set of states 
reached on step i as: 

K - Q°a (10) 

R^ 1 = R A UN A (R A ) (11) 

= R a U |J (J a(t) (12) 

sej(R\) teNc(s) 

PROPOSITION 6.1. If s is a reachable state in the concrete system such that depth (s) < 
n, then a(s) C R A . 

PROOF. We prove this by induction on n. For n = 0, the only concrete states having 
depth are those in Q , and by (10), these states are all included in R A . 

For a state t having depth k < n, our induction hypothesis shows that a(t) C R"^ 1 - Since 
RY 1 C R n A , we therefore have a(t) C i?^. 

Otherwise, suppose state f has depth n. Then there must be some state s having depth n — 1 
such that t e Nc(s). By the induction hypothesis, we must have a(s) C -R^ -1 . By (8), 
we have s £ j(a(s)), and Proposition 4.2 then implies that s 6 7(i?^~ 1 ). By (12), we can 
therefore see that a(t) C i?" . □ 

Since the abstract system is finite, there must be some n such that R\ = R n A +1 . The set of 
all reachable abstract states Ra is then R A . 

PROPOSITION 6.2. The abstract system computes an overapproximation of the set of 
reachable concrete states, i.e., 

a(R c ) C R A (13) 

Thus, even though determining the set of reachable concrete states would require examin- 
ing paths of unbounded length, we can compute a conservative approximation to this set 
by performing a bounded reachability analysis on the abstract system. 

Remark 6.3. It is worth noting that we cannot use the standard "frontier set" optimiza- 
tion in our reachability analysis. This optimization, commonly used in symbolic model 
checking, considers only the newly reached states in computing the next set of reachable 
states. In our context, this would mean using the computation -R^ 1 = R\ U Na(R % a — 
R 1 ^ 1 ) rather than that of (12). This optimization is not valid, due to the fact that 7, and 
therefore Na, does not distribute over set union. 
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As an illustration, let us perform reachability analysis on our example system: 

(1) In the initial state, state element F is the identity function, which we have seen abstracts 
to the set represented by the formula p q. This abstract state set concretizes to the 
set of functions / satisfying f(u) > u > 0. This is illustrated in Fig 3. 

(2) Let h denote the value of F in the next state. If input i is — 1, we would h(—l) = 
/(0) > 0, but we can still guarantee that h(u) > for u > 0. This is illustrated 
in Fig 4. Applying the abstraction function, we get R\ characterized by the formula 
p V -iq (see Table I.) 

(3) For the second iteration, the abstract state set characterized by the formula p V -iq con- 
cretizes to the set of functions / satisfying f(u) > when u > 0, and this condition 
must hold in the next state as well. Applying the abstraction function to this set, we 
then get R 2 A = R\, and hence the process has converged. 




Fig. 4. Reachability after 1 iteration for the example. 



7. VERIFYING SAFETY PROPERTIES 

A Boolean formula ip G E(P) can be viewed as defining a property of the abstract state 
space. Such a property is said to hold for the abstract system when it holds for every 
reachable abstract state. That is, (ip)^ — true for all op G Ra- 

For Boolean formula ip e E(V), define the formula tp* G E(V U X) to be the result of 
substituting the predicate expression <p v for each predicate symbol p e V. That is, viewing 
4> as a substitution, we have tp* ^ tp [(p/V]. 

PROPOSITION 7.1. For any formula ip £ E(V), any concrete state s, and interpreta- 
tion a x G Y.x, if op = {4>) s . ax , then (ip*) s . ax = (*p) av - 

This is a particular instance of Proposition 2. 1 . 
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We can view the formula ip* as denning a property VXip* of the concrete state space. This 
property is said to hold for concrete state s, written VXip*(s), when (ip*) s , ax = true for 
every ox G ^x- The property is said to hold for the concrete system when VXip* (s) holds 
for every reachable concrete state s G Rc- 

With our example system, letting formula ip = p V -iq, and noting that pV^q = q => p, 
we get as a property of state variable F that Vx : x > => F(x) > 0. 

Proposition 7.2. Property VXip*(s) holds for concrete state s if and only if (ip) = 
true for every op G a(s). 

This property follows from the definition of a (Equation 1) and Proposition 7.1. 

Alternately, a Boolean ip G E(V) formula can also be viewed as characterizing a set of 
abstract states (ip) = {op \ (ip) av = true}. Similarly, we can interpret the formula 
MXip* as characterizing the set of concrete states (VXip*) = {s \ (VXip*) s = true}. 

PROPOSITION 7.3. IfS c = (VXip*) andS A = (ip), then S c = l{S A ). 
Proof. Expanding the definition of Sc, we get 

S c = {s | Vvx G Z x : (tp*) s . ax = true} (14) 
= {s | Va x G Tix '■ op = (4>) s . ax (V)^ = true} (15) 
= {s | Vax G Z x : (cj>) s . ax G S A } (16) 

Observe that (15) follows from (14) by expanding the definition of Sc and (16) follows 
from (15) by using Proposition 7.1. □ 

The purpose of predicate abstraction is to provide a way to verify that a property MXip* (s) 
holds for the concrete system based on the set of reachable abstract states. 

THEOREM 7.4. For a formula ip G E(V), if property ip holds for the abstract system, 
then property VXip* holds for the concrete system. 

PROOF. Consider an arbitrary concrete state s G Rc and an arbitrary interpretation 
ax G Tjx- If we let op — {4>) s . ax , then by the definition of a (Equation 1), we must have 
op G a(s). By Propositions 4.1 and 6.2, we therefore have 

op G a(s) C a(R c ) C R A 

By the premise of the theorem we have (ip) av = true, and by Proposition 7.1, we have 
(^*) s -a x ~ (V')crp — true. This is precisely the condition required for the property VXtp* 
to hold for the concrete system. □ 

Thus, the abstract reachability analysis on our example system does indeed prove the prop- 
erty that any value / of state variable F satisfies Vx : x > f(x) > 0. 

Using predicate abstraction, we can possibly get a false negative result, where we fail 
to verify a property \/Xip*, even though it holds for the concrete system, because the 
given set of predicates does not adequately capture the characteristics of the system that 
ensure the desired property. Thus, this method of verifying properties is sound, but possibly 
incomplete. 
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For example, any reachable state / of our example system satisfies Va; : f(x) < => 
f(— x ) > — x > but our reachability analysis cannot show this. 

We can, however, precisely characterize the class of properties for which this form of 
predicate analysis is both sound and complete. A property VXip* is said to be inductive 
for the concrete system when it satisfies the following two properties: 

(1) Every initial state s G Q satisfies MX^p*(s). 

(2) For every pair of concrete states (s, t), such that t G N c (s), if VXip* (s) holds, then 
so does VXi/>*(t). 

Proposition 7.5. IfWXip* is inductive, then VXip* holds for the concrete system. 

This proposition follows by induction on the state sequence leading to each reachable state. 

Let pa be a formula that exactly characterizes the set of reachable abstract states. That is, 

(p A ) = Ra- 

Lemma 7.6. VXp* A is inductive. 

PROOF. By definition, {pa) 0v = true if and only if op G Ra, and so by Proposition 
7.2, MXp* A (s) holds for concrete state s if and only if a(s) C Ra. 

We can see that the first requirement is satisfied for any s G Qq, since a(s) C a(Qo) C Ra 
and therefore MXp* A (s) holds by Proposition 7.2. 

Now suppose there is a state t G Nc(s) and yXp* A {s) holds. Then we must have a(s) C 
i?^4 for some i > 0. From (8), we have s G 7(a(s)) C 7(i?^), and therefore, by (12), 
a(t) C i?^ 1 C i?^. Thus, the second requirement is satisfied. □ 

Lemma 7.7. IfVXip* is inductive, then ip holds for the abstract system. 

PROOF. We will prove by induction on i that (^p) av = true for every op G R\. From 
the definition of Ra, it then follows that (ip} av = true for every op G Ra, and therefore 
ip holds for the abstract system. 

For the case of i = 0, (10) indicates that R A = a(Q ). Thus, by the definition of a 
(Equation 1) for every op G R% there must be a state s and an interpretation ox G 
such that <jp = (0) s . By the first property of an inductive predicate and by Proposition 
7.1, we have (V)^ = (^*> a . <w = true. 

Now suppose that = true for all op G i?^. Consider an element rp G R l A l . If 

G R l A , then our induction hypothesis shows that (tp) = true. Otherwise, by (12), 
and the definitions of a (Equation 1), the transition relation Nq, and 7 (Equation 5), there 
must be concrete states s and t satisfying: 

(1) rp G a(t). That is, rp = (4>) t . T for some Tx G 

(2) teN c (s). 

(3) s G 7(i?^)- That is ' for a11 ox G Sat, if c^p = {4>) s . ax , then <7p G i?^. 

By Proposition 7.1 we have (ip*) s . crx — {^P) aT = true, and therefore VA"^*(s) holds. 
By the second property of an inductive predicate, ^Xip* (t) must also hold. Applying 
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Proposition 7.1 once again, we therefore have (ip) Tv = (ip*) t . Tx = true. This completes 
our induction. □ 

This lemma simply shows that if we present our predicate abstraction engine with a fully 
formed induction hypothesis, then it will be able to perform the induction proof. But, it 
has important consequences. 

For a formula ip G E(V) and a predicate set <p, the property VXip* is said to have an 
induction proof over (f> when there is some formula \ G E(V), such that \ ip and 
\/X\* is inductive. That is, there is some way to strengthen ip into a formula \ that can be 
used to prove the property by induction. 

THEOREM 7.8. A formula ip G E(V) is a property of the abstract system if and only if 
the concrete property VXip* has an induction proof over the predicate set (p. 

PROOF. Suppose there is a formula \ such tnat V%X* is inductive. Then by Lemma 
7.7, we know that \ holds in the abstract system, and when \ ^> we can i n f er that i> 
holds in the abstract system. 

On the other hand, suppose that ip holds in the abstract system. Then the formula pa 
(characterizing the set of all reachable abstract states) satisfies pA => ip and VXp A is 
inductive. Hence VXip* has an induction proof over (p. □ 

This theorem precisely characterizes the capability of our formulation of predicate abstrac- 
tion — it can prove any property that can be strengthened into an induction hypothesis 
using some combination of the predicates. Thus, if we fail to verify a system using this 
form of predicate abstraction, we can conclude that either 1) the system does not satisfy the 
property, or 2) we did not provide an adequate set of predicates out of which the predicate 
abstraction engine could construct a universally quantified induction hypothesis. 

COROLLARY 7.9. The property VXp* A is the strongest inductive invariant for the con- 
crete system of the form VX\*, where \ G E(V). Alternately, for any other inductive 
property VX X *, where X G E{V), VXp* A => VX X *. 

PROOF. The proof follows easily from Theorem 7.8, the fact that pa => X whenever \ 
is a property of the abstract state space, Proposition 7.3 and Proposition 4.2. □ 

Remark 7.10. To fully automate the process of generating invariants, we need to fur- 
ther discover the predicates automatically. Other predicate abstraction tools [Ball et al. 
2001; Henzinger et al. 2002; Chaki et al. 2003; Das and Dill 2002] generate new pred- 
icates based on ruling out spurious counterexample traces from the abstract model. This 
approach cannot be used directly in our context, since our abstract system cannot be viewed 
as a state transition system, and so there is no way to characterize a counterexample by a 
single state sequence. In this paper, we do not address the issue of discovering the indexed 
predicates: we provide a syntactic heuristic based on the weakest precondition transformer 
in a separate work [Lahiri and Bryant 2004]. 
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8. QUANTIFIER INSTANTIATION 

For many subsets of first-order logic, there is no complete method for handling the uni- 
versal quantifier introduced in function 7 (Equation 5). For example, in a logic with un- 
interpreted functions and equality, determining whether a universally quantified formula 
is satisfiable is undecidable [Borger et al. 1997]. Instead, we concretize abstract states 
by considering some limited subset of the interpretations of the index symbols, each of 
which is defined by a substitution for the symbols in X. Our tool automatically gener- 
ates candidate substitutions based on the subexpressions that appear in the predicate and 
next-state expressions. Details of the quantifier instantiation heuristic can be found in an 
earlier work [Lahiri et al. 2002]. These subexpressions can contain symbols in V, X, and 
X. These instantiated versions of the formulas enable the verifier to detect specific cases 
where the predicates can be applied. 

More precisely, let it be a substitution assigning an expression 7r x G E(V U X U X) for 
each x G X. Then <j> f [it/ X] will be a Boolean expression over symbols V, X, and X that 
represents some instantiation of predicate <j> p . 

For a set of substitutions II and interpretations ax G ^x and ax G Si, we define the 
concretization function 711 as: 

ln(S A ,o~x,Gx) = {s|V7ren: (^{n/X})^.^ € S A } (17) 

PROPOSITION 8.1. For any abstract state set Sa and interpretations ax G Tix and 
ax G Si: 

(7) j(Sa) C 7n(S , A, <?x, <7z)for any set of substitutions II. 

(2) 7n (Sa ,&x,ox) ^ ju 1 (Sa , &x , 01) for any pair of substitution sets II and II' satisfy- 
ing II D IT. 

(3) For any abstract state set Ta, if Sa Q Ta, then 7n(<S' J 4, ax, ox) C ju(Ta, ax , ax), 
for any set of substitutions II. 

These properties follow directly from the definitions of 7 and 7n and Proposition 2. 1 . 

PROPOSITION 8.2. For any concrete state set Sc, set of substitutions II, and interpre- 
tations ax G Tjx and 01 G Si: 

S c C jn(a(S c ),ax,a x ). (18) 



This property follows directly from Theorem 4.3 and Proposition 8.1. It shows that for 
a given interpretation a x and 01, the functions (a,7n) satisfy one of the properties of 
a Galois connection (Equation 8), but they need not satisfy the other (Equation 9). For 
example, when II = 0, the quantified condition of (17) becomes vacuous, and hence 
1yi(Sa, ox) = S v . 

We can use 7n as an approximation to 7 in defining the behavior of the abstract system. 
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That is, define Nn over sets of abstract states as: 

Nu(Sa) = {{^[5/V]) s . ax . ax \o x G Z x ,a x G S T ,« G 7n(^A,^,^)} (19) 
= U U U {^[*/VI>..„.«} (20) 

Observe in this equation that p [5/V] is an expression describing the evaluation of predi- 
cate p in the next state. 

It can be seen that Nyi(Sa) 2 N A (S A ) for any set of abstract states Sa- As long as II 
is nonempty (required to guarantee that iVn is null-preserving), it can be shown that the 
system defined by Nn is an abstract interpretation of the concrete system: 

(1) A^n(0) = 0, if n is nonempty. 

(2) Nn is monotonic: This follows from the definition of Nn in (20) and Proposition 8.1. 

(3) a(N c {S c )) C Nn(a{Sc)): This follows from the fact that a (N C (S C )) C N A (a(S c )) 
mdN A (S A )CN n (S A ). 

We can therefore perform reachability analysis: 

•^n = Q°a 
Rg 1 = i4uJVn(i4) 

These iterations will converge to a set Rn- 

Proposition 8.3. 

(7) Ra C Rn for any set of substitutions TL 
(2) Rn C i?n' for any pair of substitution sets IT and II' satisfying II D II'. 

To see the first property, consider the following way of expressing the equation for R^ 1 
(12) using the alternative equation for a (4), and rearranging the order of the union opera- 
tions: 

R\ +1 = R\ u (J U U {<*[*/VI>..«.«} 

The property then follows by Proposition 8.1, using induction on i. The second property 
also follows by Proposition 8.1 using induction on i. 

THEOREM 8.4. For a formula ip G E(V), iffy)— = true for every op G Rn, then 
property MX^p* holds for the concrete system. 

Proof. Since (tp)^ = true for every op G Rn and Ra Q Rn (by Proposition 8.3), 
(V , ) CTp = true for every op G Ra- Hence by Theorem 7.4, the property VXip* holds for 
the concrete system. 

□ 
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This demonstrates that using quantifier instantiation during reachability analysis yields a 
sound verification technique. However, when the tool fails to verify a property, it could 
mean, in addition to the two possibilities listed earlier, that 3) it used an inadequate set of 
instantiations, or 4) that the property cannot be proved by any bounded quantifier instanti- 
ation. 

9. SYMBOLIC FORMULATION OF REACHABILITY ANALYSIS 

We are now ready to express the reachability computation symbolically, where each step 
involves finding the set of satisfying solutions to a quantified CLU formula. We will then 
see how this can be converted into a problem of finding satisfying solutions to a Boolean 
formula. 

On each step, we generate a Boolean formula p l n , that characterizes R^. That is (pli) = 
The formulas directly encode the approximate reachability computations of (21) and 

(22). 

Observe that by composing the predicate expressions with the initial state expressions, 
4> [g°/V], we get a set of predicates over the initial state symbols J indicating the con- 
ditions under which the predicates hold in the initial state. We can therefore start the 
reachability analysis by finding solutions to the formula 

p° u (V) = 3X3 J f\ p 4> [q°/V] (23) 

Proposition 9.1. (p^) = Q° A 

Let us understand the expression p^ by showing why it represents Q A . Expanding the 
definition of Q A , we get: 

U U {<*>.-«} (24) 
Again, Q° c = \J ajeJ:j |(9°) crj }■ Using Proposition 2.1, we can rewrite (24) as: 

Q°A= U U (25) 

To generate a formula for the next-state computation, we first generate a formula for 
■y n (R]j, ax , ox) by forming a conjunction over each substitution in n, where we com- 
pose the current-state formula with the predicate expressions and with each substitution n: 

A. en (pfi [<t>/v]) 

The formula for the next-state computation combines the alternate definition of iVn (20) 
and the formula for 7n above: 

Pn +1 (^) = P\iCP) V 

3V3X31 1 /\ (pi, WV]) [n/X] A /\ p & ^ [S/V] . (26) 

\7ren p£T> J 
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To understand the quantified term in this equation, note that the left-hand term is the for- 
mula for 7n(Pn, &x, at), while the right-hand term expresses the conditions under which 
each abstract state variable p will match the value of the corresponding predicate in the 
next state. 

PROPOSITION 9.2. = 

Let us see how this symbolic formulation would perform reachability analysis for our ex- 
ample system. Recall that our system has two predicates <p p = F(x) > and <p q = x > 0. 
In the initial state, F is the function Xu.u, and therefore p [<?°/V] simply becomes x > 0. 
Equation (23) then becomes 3x [(p 44> x > 0) A (q <^> x > 0)], which reduces to p q. 

Now let us perform the first iteration. For our instantiations we require two substitutions 
7r and 7r' with n x = x and tt' x = i + 1. For pnCP^) = P <1> tne left-hand term of 
(26) instantiates to (F(x) > x > 0) A (F(i + 1) > <^> i + 1 > 0). Substituting 
Xu.ITE(u = i, F(i+1), F(u)) for F in <j> p gives (x = iAF(i+l) > 0)V(x^iAF(x) > 0). 

The quantified portion of (26) for Pn(p, q) then becomes: 

/ F(x)>0^x>0 A F(i + l)>0»i + l>0 \ 
]F,x,i: ApO [(x = i AF(i + l) > 0) V (x^iAF(x) > 0)] (27) 
\ A q^ x > J 

The only values of p and q where this formula cannot be satisfied is when p is false and q 
is true. 

As shown in [Lahiri et al. 2003], we can generate the set of solutions to (23) and (26) 
by first transforming the formulas into equivalent Boolean formulas and then performing 
quantifier elimination to remove all Boolean variables other than those in V. This quantifier 
elimination is similar to the relational product operation used in symbolic model checking 
and can be solved using either BDD or SAT-based methods. 

1 0. USING A SAT SOLVER TO PERFORM REACHABILITY ANALYSIS 

Observe that (26) has a general form x'('P) = x('P) V 3A9(A, V), where 9 is a quantifier- 
free CLU formula, A contains Boolean, integer, function, and predicate symbols, and V 
contains only Boolean symbols. Several methods (including those in [Bryant et al. 2002b; 
Strichman et al. 2002; Bryant et al. 2002a]) have been developed to transform a quantifier- 
free CLU formula 6 {A, V) into a Boolean formula 9(A, V), where A is now a set of 
Boolean variables, in a way that preserves satisfiability. 

By taking care [Lahiri et al. 2003], this transformation can be performed in a way that 
preserves the set of satisfying solutions for the symbols in V. That is: 

{op\3a A : {0) =true} = {a v \3a A : (§) = true} (28) 

Based on such a transformation, we can generate a Boolean formula for \' by repeatedly 
calling a Boolean SAT solver, yielding one solution with each call. In this presentation, we 
consider an interpretation a v to represent a Boolean formula consisting of a conjunction 
of literals: p when op(p) = true and -ip when a-p(p) = false. Starting with \' = X> an d 
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& = 9 A ->x> we perform iterations: 

a A ,op <- SATSolve(9') 
X <- X Vo-p 
0' <- 6' A -.op 

until 0' is unsatisfiable. 

To illustrate this process, let us solve (27) to perform the first iteration of reachability 
analysis on our example system. We can translate the right-hand term into Boolean form 
by introducing Boolean variables a, b, c, d and e encoding the predicates F(x) > 0, x > 0, 
F(i + 1) > 0, i + 1 > 0, and x = i, respectively. 

The portion of (27) within square brackets then becomes 

a«bAc<»dA(p«[(eAc)V (^e A a)]) A (qO b). 

To this, let us add the consistency constraint: e A b => d (encoding the property that 
x = iAx > => i + 1 > 0). Although the translation schemes will add a lot more 
constraints (e.g., those involving uninterpreted function symbol), the above constraint is 
sufficient to preserve the property described in (28). For simplicity, we will not describe 
the other constraints that would be added by the algorithms in [Lahiri et al. 2003]. Finally, 
all the symbols apart from p and q are existentially quantified out. 

It is easy to verify that the equation above with the consistency constraint is unsatisfiable 
only for the assignment when p is false and q is true. 



1 1 . AXIOMS 

As a special class of predicates, we may have some that are to hold at all times. For 
example, we could have an axiom f (x) > to indicate that function f is always positive, 
or f (y, z) = f (z, y) to indicate that f is commutative. Typically, we want these predicates 
to be individually quantified, but we can ensure this by defining each of them over a unique 
set of index symbols, as we have done in the above examples. 

We can add this feature to our analysis by identifying a subset Q of the predicate symbols V 
to be axioms. We then want to restrict the analysis to states where the axiomatic predicates 
hold. Let Let £;§ denote the set of abstract states op where o-p (p) = true for every p e Q. 
Then we can apply this restriction by redefining a(s) (Equation 1) for concrete state s to 
be: 

«(«) = {(0Uke^}ns2 (29) 

and then using this definition in the extension of a to sets (Equation 3), the formulation of 
the reachability analysis (Equations 10 and 12), and the approximate reachability analysis 
(Equations 21 and 22). 
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The symbolic formulation of the approximate reachability analysis then becomes: 

p n (V) = 3X3J ( /\ p & <j> [q°/V] A f\ cj> [q°/V] ] 
\per-Q pes J 

p^\r) = Pri 0P)v 

f\ (f^[4>/V])[n/X\ A /\ p ^ </> p [S/V] A /\ & [J/V] 
\7ren peP-Q pes 

12. APPLICATIONS 

We have integrated the method described in this paper into UCLID [Bryant et al. 2002b], 
a tool for modeling and verifying infinite-state systems. We have used our predicate ab- 
straction tool to verify safety properties of a variety of models and protocols. Some of the 
more interesting ones include: 

(1) A microprocessor out-of-order execution unit with an unbounded retirement buffer. 
Prior verification of this unit required manually generating 13 invariants [Lahiri et al. 
2002]. The verification did not require any auxiliary invariants from the user and the 
proof script (which consists of the 13 simple predicates) is more compact than other 
verification efforts of similar models based on compositional model checking [McMil- 
lan 1998] or theorem proving methods [Arons and Pnueli 1999; Hosabettu et al. 1999]. 

(2) A directory-based cache protocol with unbounded channels, devised by Steven Ger- 
man of IBM [German ], as discussed below. 

(3) Versions of Lamport's bakery algorithm [Lamport 1974] that allow arbitrary number 
of processes to be active at each step or allow non-atomic reads and writes. 

(4) Selection sort algorithm for sorting an arbitrary large array. We prove the property that 
upon termination, the algorithm produces an ordered array. 

(5) A model of the Ad-hoc On-demand Distance Vector (AODV) routing protocol [C.Perkins 
et al. 2002]. This model was obtained from an earlier work [Das and Dill 2002], where 
the protocol was verified using quantified predicates. 

(6) A crucial invariant (similar to the one proved in [Arons et al. 2001]) for proving the 
mutual exclusion for the Peterson's [Peterson 1981] mutual exclusion algorithm. 



1 2.1 Directory-based Cache Coherence Protocol 

For the directory-based German's cache-coherence protocol, an unbounded number of 
clients (cache), communicate with a central home process to gain exclusive or shared 
access to a memory line. The state of each cache can be {invalid, shared, exclusive}. 
The home maintains explicit representations of two lists of clients: those sharing the cache 
line (sharer_list) and those for which the home has sent an invalidation request but has 
not received an acknowledgment (invalidateJList) — this prevents sending duplicate 
invalidation messages. 

The client places requests {reqjhared, req_exclusive} on a channel ch_l and the home 
grants {grant_shared, grant .exclusive} on channel ch_2. The home also sends invali- 
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dation messages invalidate along ch_2. The home grants exclusive access to a client 
only when there are no clients sharing a line, i.e. Vi : sharer_list(i) = false. The 
home maintains variables for the current client (current_client) and the current request 
(current_command). It also maintains a bit exclusive_granted to indicate that some 
client has exclusive access. The cache lines acknowledge invalidation requests with a in- 
validate Mck along another channel ch_3. At each step an input cid is generated to denote 
the process that is chosen at that step. Details of the protocol operation with single-entry 
channels can be found in many previous works including [Pnueli et al. 2001]. We will refer 
to this version as german-cache. 

Since the modeling language of UCLID does not permit explicit quantifiers in the sys- 
tem, we model the check for the absence of any sharers Vi : sharer_list(i) = false 
alternately. We maintain a Boolean state variable emptyJisl, which assumes an arbitrary 
value at each step of operation. We then add an axiom to the system: emptyJisl Vi : 
sharer_list(i) = false The quantified test Vi : sharer_list(i) = false in the 
model is replaced by emptyJisl. 

In our version of the protocol, each cache communicates to the home process through three 
directed unbounded FIFO channels, namely the channels ch_l, ch_2, ch_3. Thus, there are 
an unbounded number of unbounded channels, three for each client 2 . It can be shown that 
a client can generate an unbounded number of requests before getting a response from the 
home. We refer to this version of the protocol as german-cache-fifo . 

Proving Cache Coherence We first consider the version german-cache which has been 
widely used in many previous works [Pnueli et al. 2001; Emerson and Kahlon 2003; 
Baukus et al. 2002] among others and then consider the extended system german-cache- 
fifo. In both cases, the cache coherence property to prove is Vi, j : cache(i) = exclusive 
A« ^ j => cache(j) ^invalid. All the experiments are performed on an 2.1GHz Pentium 
machine running Linux with 1GB of RAM. 

12.1.1 Invariant Generation for german-cache. For this version, we derived two induc- 
tive invariants, one which involves a single process index i and other which involves two 
process indices i and j. 

For single index invariant, we needed to add an auxiliary variable last_granted which 
tracks the last variable which has been granted exclusive access [Pnueli et al. 2001]. The 
inductive invariant which implies the cache coherence property was constructed using the 
following set of predicates: 

V = { emptyJisl, exclusive_granted, current_command = reqshared, current_command = 
reqjexclusive, i = last_granted, invalidate J.ist(i), sharer J.ist(i), cache(i) = 
exclusive, cache(i) = invalid, ch_2(i) = grant .exclusive, ch_2(i) = grantjihared, ch_2(z) = 
invalidate, ch_3(i) = invalidate Mck }. 

These predicates naturally appear in the system description. First, the predicates emptyJisl 
and exclusive^granted are Boolean state variables. Next, for each enumerated state 
variable x, with range {ei, . . . , e m }, we add the predicates x = ei, . . ., x = e m _i, leav- 



1 Our current implementation only handles one direction of the axiom, Vi : emptyJisl sharer_list(i) = 
false, which is sufficient to ensure the safety property. 
2 The extension was suggested by Steven German himself 
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ing the redundant predicate x = e m . This explains current.command = reqjihared and 
current.command = req_exclusive. Next, we consider the values of the function and 
predicate state variables at a particular index i. In this example, such state variables are 
the sharer_list, invalidate_List, cache, ch_l, ch_2 and ch_3. We did not need to 
add any predicate for the ch_l since the content of this channel does not affect the correct- 
ness condition. Finally, the predicate i = last_granted was added for the auxiliary state 
variable last_granted. 

With this set of 13 indexed predicates, the abstract reachability computation converged 
after 9 iterations in 14 seconds. Most of the time (about 8 seconds) was spent in eliminating 
quantifiers from the formula in (23) and (26) using the SAT-based quantifier elimination 
method. 

For the dual index invariant, addition of the second index variable j makes the process 
computationally more expensive. However, the verification does not require any auxiliary 
variable to prove the correctness. The set of predicates used is: 

V = { cache(i) = exclusive, cache(j) = invalid, i = j, ch2(«) = grant .exclusive, 
ch2(«) = grant ^shared, ch.2(i) = invalidate, ch3(«) = empty, ch2(j) = grant exclusive, 
ch2(j) = grantjhared, ch2(j) = invalidate, ch3(j) = empty, invalidateJList(i), 
current_command = reqjexclusive, current_command = reqjshared, exclusive_granted, 
sharer_list(i), }. 

The inductive invariant which implies the cache-coherency was constructed using these 16 
predicates in 41 seconds using 12 steps of abstract reachability. The portion of time spent 
on eliminating quantifiers was around 15 seconds. 

12.1.2 Invariant Generation for german-cache-fifo. For this version, each of the chan- 
nels, namely chl, ch2 and ch.3 are modeled as unbounded FIFO buffers. Each channel 
has a head (e.g. chlJid), which is the position of the earliest element in the queue and a 
tail pointer (e.g. chl_tl), which is the position of the first free entry for the queue, where 
the next element is inserted. These pointers are modeled as function state variables, which 
maps process i to the value of the head or tail pointer of a channel for that process. For 
instance, ch2_hd(i) denotes the position of the head pointer for the process i. The channel 
itself is modeled as a two-dimensional array, where ch2(i, j) denotes the content of the 
channel at index j for the process i. We aim to derive an invariant over a single process 
index i and an index j for an arbitrary element of the channels. Hence we add the auxiliary 
variable last_granted. The set of predicates required for this model is: 

V = { cache(i) = exclusive, cache(i) = invalid, i = last_granted, current_command = 
reqjhared, current_command = req .exclusive, exclusive_granted, invalidate_list(i), 
sharer_list(i), j = ch2_hd(i), j = ch3Jid(i), j < ch2_hd(i), j < ch2_tl(i), 

j < ch3Jid(i), j < ch3_tl(«), j = ch2_tl(i)-l, chl_hd(i) < chl_tl(i), chl_hd(i) = 
chl_tl(i), ch2_hd(i) < ch2_tl(«), ch2_hd(i) = ch2_tl(i), ch2(i, j) = grant ^exclusive, 
ch2(i,j) — grantjhared, ch2(i, j) — invalidate, ch3_tid(i) < ch3_tl(i), ch3_hd(i) = 
ch3_tl(i), ch3_tl(i) = ch3_hd(i) + l, ch3(«, j) = invalidate Mck }. 

Apart from the predicates required for german-cache, we require predicates involving en- 
tries in the various channels for a particular cache entry i. Predicates like chl_hd(i) < 
chl_tl(i) and chl _hd(i) = chl_tl(i) are used to determine if the particular channel is 
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non-empty. To reason about active entries in a FIFO, i.e., those lying between the head (in- 
clusive) and the tail, we need predicates like j < ch2_hd(i) and j < ch2_tl(i). The con- 
tent of the channel at a location j is given by the predicates like ch2(i, j) = grant .exclusive 
and ch3(i,j) = invalidate sick. Finally, a couple of predicates like ch3_tl(i) = ch3_hd(i)+ 
1 and j = ch2_tl(i)— 1 are added by looking at failures to prove the cache coherence prop- 
erty. 

Our tool constructs an inductive invariant with these 26 predicates which implies the cache 
coherence property. The abstract reachability took 17 iterations to converge in 1435 sec- 
onds. The quantifier elimination process took 1227 seconds. 
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